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The optimal reconstruction of cosmic metric jjcrturbations and other signals requires knowledge 
of their power spectra and other parameters. If these are not known a priori, they have to be 
measured simultaneously from the same data used for the signal reconstruction. We formulate the 
general problem of signal inference in the presence of unknown parameters within the framework 
of information field theory. To solve this, we develop a generic parameter uncertainty renormalized 
estimation (PURE) technique. As a concrete application, we address the problem of reconstruct- 
ing Gaussian signals with unknown power-spectrum with five different approaches: (i) separate 
maximum-a-posteriori power spectrum measurement and subsequent reconstruction, (ii) maximum- 
a/-posteriori reconstruction with marginalized power-spectrum, (iii) maximizing the joint posterior 
of signal and spectrum, (iv) guessing the spectrum from the variance in the Wiener filter map, and 
(v) ronormalization flow analysis of the field theoretical problem providing the PURE filter. In all 
cases, the reconstruction can bo described or approximated as Wiener filter operations with assumed 
signal spectra derived from the data according to the same recipe, but with differing coefficients. 
All of these filters, except the renormalized one, exhibit a perception threshold in case of a Jeffreys 
prior for the unknown spectrum. Data modes with variance below this threshold do not affect the 
signal reconstruction at all. Filter (iv) seems to be similar to the so called Karhune-Loeve and 
Feldman-Kaiser-Peacock estimators for galaxy power spectra used in cosmology, which therefore 
should also exhibit a marginal perception threshold if correctly implemented. We present statistical 
performance tests and show that the PURE filter is superior to the others, especially if the post- 
Wiener filter corrections are included or in case an additional scale-independent spectral smoothness 
prior can be adopted. 



I. INTRODUCTION 
A. The generic sensing problem 

Reception of a signal is strongly aided by prior knowl- 
edge of the signals properties. This is especially true 
in low signal to noise (S/N) situations, in which proper 
knowledge can make the difference between recognition 
of a signal and blindness. Our human senses like vision 
and hearing are strongly enhanced by our knowledge on 
the possible signals present in the data-stream entering 
the human brain. The very same is true for signal recep- 
tion by artificial sensor systems, since signal knowledge 
permits us to construct optimal filters, suppressing the 
noise as far as possible while focusing on the data modes 
with stronger S/N. If sufficient training data are avail- 
able, or theoretical reasoning permits us to predict signal 
properties, optimal filter design is possible and relatively 
straightforward. 

However, there are situations, where such knowledge 
is not available, or is to be excluded on purpose from 
the analysis, in order to have a prejudice-free signal re- 
construction. In such a situation the required parame- 
ters have to be measured simultaneously from the same 
data which is used for the signal reconstruction. Due to 
the interdependence of reconstructed signal and parame- 
ters, the problem becomes non-trivial and in general non- 
linear, even if the original inference problem was linear 
for fixed parameter values. 

Let us provide a concrete example in cosmology. The 



cosmic matter distribution and its imprinted metric fluc- 
tuations on large scales can be well approximated to be 
a Gaussian random field obeying statistical isotropy and 
homogeneity. Knowledge of the power spectrum of these 
fields permits us to construct optimal and linear recon- 
struction filters for data of any linear tracers like the 
cosmic microwave background, the galaxy distribution 
(approximatively) , or the gravitational lensing signature. 
For a set of cosmological parameters (e.g. Hubble con- 
stant, cosmic matter content, ...) these power spectra 
are known and can be used. However, the cosmologi- 
cal parameters themselves are not precisely known, and 
our best knowledge might come from the data-set we are 
analyzing. Furthermore, if we want to be open for non- 
standard cosmological scenarios, we might not want to 
put any prior assumption on the functional form of the 
power spectrum into our signal reconstruction problem. 

Therefore, we need signal reconstruction methods, 
which arc capable of dealing with uncertainties in the 
parameters of the problem. Such methods would be very 
useful in many situations, where prior knowledge on sig- 
nal properties are absent or should be avoided. Some loss 
in fidelity compared to the case where these parameters 
are known can be expected, however, such methods can 
be expected to be flexible and robust due to their generic 
nature and self-tuning abilities. 

For the; problem of the; reconstruction of the cosmic 
large-scale structure, the key parameter is the cosmic 
matter power-spectrum. It is known in the field of signal 
detection, that a statistical verification of the presence of 
a signal due to an increase in the data variance is possible 
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well before the signal can be reconstructed itself. Thus, 
a measurement of the signal power-spectrum is already 
possible while the S/N-ratio is too low for map-making, 
and is therefore immediately available for filter optimiza- 
tion as soon as the critical S/N-ratio is achieved. 

B. Derived filters 

The signal reconstruction filters derived in this work 
can all be regarded as or approximated by an application 
of a data-dependent Wiener filter operator onto the data, 
which results in a non-linear transformation of the data. 
The Wiener filter construction requires the knowledge of 
the signal covariance, or spectrum, the instrument re- 
sponse and the noise covariance. The signal covariance 
has to be extracted from the data itself, and therefore 
introduces a data dependence into the filter. The five fil- 
ters presented in this work differ in the way the assumed 
covariance is constructed, due to the different philoso- 
phies: 

1. MAP spectrum filter: The maximum a posteri- 
ori (MAP) of the spectrum given the data should be 
a reasonable guess for the signal spectrum assumed 
in the Wiener reconstruction. 

2. Classical map: The inference problem should be 
marginalized over all possible power spectra. In 
doing so, and deriving the classical filter equation 
by extremizing the resulting effective posterior, a 
data-dependent Wiener filter is derived, in which an 
effective spectrum emerges. This spectrum differs 
in general from the MAP-spectrum. 

3. Joint MAP filter: Instead of marginalizing the 
joint posterior of signal and spectrum and then ex- 
tremizing it with respect to one of those, we can 
maximize it with respect to both, leading to the 
joint MAP filter. 

4. Critical filter: This filter results if one requires 
the covariance of the Wiener filter map to ex- 
hibit exactly its expected variance, while taking 
the power loss due to the filter operation into ac- 
count. The critical filter implements accurately 
the idea behind frequently used power spectrum 
estimation schemes used in cosmology, like the 
Karhunen-Loeve (KL, [T3t [23l l3T] ) and Feldman- 
Kaiser- Peacock (FKP, [5]) estimators. In case of a 
Jeffreys prior on the spectral normalisation, it ex- 
hibits a marginal perception threshold and marks 
the demarcation line between filter with, as the 
three above, and filter without such a threshold, 
as the next one. 

5. PURE filter: Our ultimate filter would imple- 
ment the Baysian mean of the signal posterior 
marginalized over the unknown spectral parame- 
ter. Only this provides the optimal reconstruction 



algorithm in the sense of minimizing the recon- 
struction error variance. This can only be done 
by a full field theoretical treatment which incor- 
porates spectrum-uncertainty effects correcting for 
imbalances of the induced errors due to over- and 
underestimations of the signal spectrum. Here, 
we incorporate such a correction by virtue of an 
uncertainty-renormalization calculation. The re- 
sulting parameter uncertainty renormalized estima- 
tion (PURE) filter appears only to be a Wiener 
filter in case only an infinitesimal amount of un- 
certainty is added. The renormalized-optimal spec- 
trum as a fixed point of this uncertainty adding op- 
eration is different from the spectra of the other fil- 
ter. In case a finite amount of uncertainty is added, 
the PURE filter contains corrections terms which 
can not be described exactly as Wiener filtering. 



C. Previous works 

The PURE approach is derived within information 
field theory (IFT). This deals with the information of 
data on spatially distributed quantities, and is a sta- 
tistical field theory. The connection of inference prob- 
lems and statistical field theories was discovered indepen- 
dently by several authors in cosmology pQ [51 [TU] , statis- 
tical field theory [2H1], and quantum mechanics [TSU21j . 
A pedagogical introduction into IFT can be found in [5] . 

The uncomfortable dependence of information theoret- 
ical methods on signal prior information have lead several 
authors to think about methods to extract this informa- 
tion at least partly from the data. For example a smooth- 
ness prior for the signal can be used, where an "optimal" 
value for the smoothness controlling parameter derives 
from the data themself [5] . The optimal smoothness con- 
straint for a Gaussian signal is provided by its covari- 
ance, as known from Wiener filter theory |33j . A natu- 
ral proposal is therefore to measure the power spectrum 
(or any characteristics of the signal covariance) from the 
data and to use this for Wiener filtering or other sig- 
nal reconstruction methods 22, 26, 27, SO^. Data gaps 
complicate the power spectrum measurement step, but 
extensions of such methods to this case exist . How- 
ever, a more theoretical understanding of the inference 
problem and the assumptions implicitly made by these 
methods would be beneficial to answer several questions. 
How should the spectrum be measured optimally? How 
can spectral prior information be incorporated into the 
filter? And is the best spectral estimator really the best 
choice for the spectrum assumed in the Wiener filter? 

Only Bayesian approaches, which are explicitely deal- 
ing with all relevant prior information, can answer these 
questions accurately. For example, it is possible to use 
the MAP approach to the problem of Wiener filtering 
if the overall amplitude of the signal covariance is un- 
known, even on a logarithmic scale |14j . For a white 
signal, where all pixels are statistically independent, this 
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can be generalized to the case that all pixels amplitudes 
are drawn from a scale-free distribution function (28} . 

In precision cosmology, the problem of inferring the im- 
age and its power spectrum simultaneously is very promi- 
nent in cosmic microwave studies and cosmography of the 
large scale structure. It has been addressed rigorously 
via the Gibbs sampling scheme [71 [TTJ [T^ [35]. Since 
this approach samples the full joint posterior of maps 
and spectra, it provides the full solution to the problem. 
However, the computational costs of Gibbs sampling are 
high. Also obtaining analytical insights into the general 
behavior of the scheme is not trivial. Computationally 
cheaper and analytically simpler, or even just alternative 
methods are therefore interesting and and some of the al- 
gorithms provided by this work are good candidates for 
being this. 



D. Structure of the work 

We introduce IFT with parameter uncertainties in Sec. 
[nj In Sec. |III[ the problem of signal spectrum uncertainty 
is introduced, and the four of the mentioned filters are 
derived from MAP principles. To go beyond the MAP 
approximation the generic PURE approach is developed 



in Sec. IV where for any case with fourth order interac- 
tion terms the generic uncertainty renormalization flow 
equation is provided. The specific application of this ap- 
proach is given in Sec. [V[ where the PURE filter for the 
problem of reconstruction without spectral knowledge is 
derived. The perception thresholds of all these filters 
are investigated in Sec. |VI[ and their fidelity in Sec. |VII[ 
where also a PURE filter with spectral smoothness prior 



is presented. Finally, we conclude in Sec. VIII 



II. 



INFORMATION FIELD THEORY WITH 
PARAMETER UNCERTAINTY 

A. Information field tiieory 



We briefiy introduce the concepts of IFT and extend 
them to the case of parameter uncertainties. A more 
pedagogical introductions, as well as more details on ter- 
minology and notation of the framework can be found 
in [S]. An information field is simply a spatially ex- 
tended signal, where a signal s is any quantity a scientist 
might be interested in measuring. We treat the signal 
s{x) = Sx, a function of a spatial coordinate x, as an 
abstract vector in Hilbert space with the scalar product 
ps — J dxj{x) s{x). 

The goal of IFT is to make statements on the signal 
field, which is constrained by prior knowledge and obser- 
vational data. Since we are usually dealing with a finite 
number of noisy data points, a precise reconstruction of a 
signal field with its infinite number of degrees of freedom 
is rarely possible. Our aim is therefore to investigate the 
probability function of s given the data d, the so called 



posterior P{s\d). The posterior is usually constructed 
from the signal prior P{s) and the likelihood of the data 
P{d\s) using Bayes theorem 



pm = 



P{d\s)P{s) 
P{d) 



(1) 



The normalisation constant here, the so called evidence 
P{d), is given by a marginalization of the signal field 



P{d) = J VsP{d,s), 



(2) 



where P{d, s) = P{d\s) P(s) is the joint probablity den- 
sity function of data and signal. The phase space or path 
integral JVs goes over all possible signal field configura- 
tions, weighted with P{d, s). 

In IFT, we rewrite Bayes theorem in the language of a 
statistical field theory, namely as 



P{.s\d) = 



z 



(3) 



where the information Hamiltonian H[s\ = — log P{d, s) 
and the partition function Z = P{d) are actually only a 
renaming of (the negative logarithm of) the joint prob- 
ability and evidence. This change in language, however, 
permits to transfer many results from statistical field the- 
ory to tackle IFT problems. 

The goal of an IFT analysis could be to calculate mo- 
ments of the signal field averaged in a similar path inte- 
gral over the posterior P{s\d), e.g. in order to know the 
mean signal 



)is\d) 



VssP{s\d). 



(4) 



This mean is of special interest, since it is optimal in 
an £^-error norm sense. It minimizes the expected error 
variance ((s — to)^(s — rn))(^s\d) among all possible m. 

In practical applications, we often discretize the signal 
field in A^pix pixels at locations Xi. Then the discretized 
path integral for any signal function /(s) is 



,S{XN^J). 



If possible, we try to avoid to evaluate such very high 
dimensional integrals nummerically. We use the fact that 
a multimodal Gaussian probability density function as 
given by 



Gis,S) 



1 

\27TS\i 



exp 



(5) 



(with \S\ denoting the determined of the matrix S) can 
be integrated analytically: JVs G{s, S) — 1. Many func- 
tional integrals can be derived from this, like the mo- 
ments of a Gaussian, and path-integrals of any quadratic 
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functional of the integrated field. Non-quadratic expo- 
nents can be expanded around the multivariate Gauss 
integral in terms of diagramatic pertubation series. For 
further details, the reader is refered to [5] and any stan- 
dard book on field theory. 

In the simplest case of the theory, signal and noise are 
independent Gaussian random variables, and the data 
depend linearly on them. This so-called free theory can 
be treated analytically and is our starting point. It has 
been analyzed in depth before and leads to the so called 
Wiener-filter theory [33] ■ However, usually the assump- 
tion that all parameters p of the problem like instrument 
calibration, or signal covariance, are known is used. This 
assumption will be dropped in the following, and we will 
see, that the otherwise trivial case gets interesting com- 
plications and the corresponding free IFT is enriched by 
interaction terms. 



B. Free theory from a Gaussian data model 

We assume that the signal we want to reconstruct is 
a Gaussian random field, with a probability distribution 
prior to any measurement described as P{s\p) = Q(s, Sp), 
where Sp — {ss^)(^s\p) is the signal covariance given the 
parameter p, which itself might be a vector or even a field 
over some space. The subscript (s|p) on the brackets of 
the expectation value indicate that the average should be 
done over the probability distribution P{s\p). Thus, the 
individual elements of the signal covariance matrix read 

{Sp)xy = {s{x)s{y))(^s\p) ^ j Vss{x)s{y)P{s\p). 

We further assume that the signal is processed by a 
linear measurement device with response matrix R and 
additive noise n according to: 



Thus the Hamiltonian of the Gaussian theory. 



d = Ri 



(6) 



In general, response and noise can also depend on un- 
known parameters and the general theory developed here 
can also be applied to that case. To focus the discussion, 
we only consider here the concrete example of a parame- 
ter dependent signal covariance, and assume the response 
and noise statistics to be known. We assume the noise 
to be signal-independent and Gaussian, and thus 



P{n\s,p)^g{n,N), 



(7) 



where N = {n n^)(n) is the noise covariance matrix. Since 
the noise is just the difference of the data to the signal- 
response, n = d — Rs, the likelihood of the data is 

P{d\s,p) = P{n = d- Rs\s,p) =g{d- Rs,N). (8) 

The information Hamiltonian as defined in [5 is the 
negative logarithm of the joint probability function of 
signal and data for given and fixed parameters: 



0,p' 



(10) 



is only quadratic in the signal, and therefore corresponds 
to a free field theory. Here 



Dp = [Sp^ +M] \ with M = R'^N-^R, 



(11) 



is the information propagator, which depends on the un- 
known spectral parameters. The information source, 



J 



R^'N-'^d, 



(12) 



depends linearly on the data in a response-over-noise 
weighted fashion. Finally, 



0,p 



^d^ N-U+^\og{\2nSp\\2TTN\) (13) 



absorbs all s-independent normalization constants. It 
can not be ignored here, since it depends on p. 

The key quantity, from which all relevant moments of 
the signal can be estimated, is the partition function, 

Zp[J]^ jvse-"^^'^+-'^\ (14) 

For the free field theory the partition function is 

Z^^[J] = ^\2t: Dp\ exp | + i( J + j)ti?p(J + j) - i/S^j. 

(15) 

This explicit formula permits us to calculate the expec- 
tation of the signal given the data (and the parameters), 
in the following called the map rap-. 



SlogZ^ 



!{s\d,p) 



SJ 



= Dpj 



.7=0 



= [Sp^ +R^N-^R] ^R^N-^. 



(16) 



The last expression shows that the map is given by the 
data after applying a generalized Wiener filter, rrip = 
Fp d, which depends on the parameter p of the signal 
covariance. 

Similarly, the quadratic uncertainty of the signal map 
can be worked out. It turns out that for a free theory it 
is the propagator itself 

((s - mp){s - mp)'')(s\d,p) = (ss^)(3|<j,p) ~m.pml^ Dp 



The first identity follows from (s'Tip)(s|(i,p) 



(17) 



(s)(s|d,p) "4 = 



mpinj^ due to the fact, that the re- 



constructed map nip is solely determined by the 
data, and therefore given in this average. The 
second identity holds due to the identity of the 
connected correlation function and the propagator. 



Hp[s] ^ -log P{d,s\p) = -log[P{d\s,p)P{s\p)]. (9) {ss%^d.p) = SHogZ^/SZ^\j^o 
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C. Classical field theory 



In case of the free theory, the map, Eq. 16 would also 
be obtained from a classical treatment of the Hamiltonian 
by extremizing it: 



SHp[s] 
Ss 



= 0. 



(18) 



For a Hamiltonian with interaction terms the classical 
field (in field-theoretical language) or MAP estimator (in 
signal processing language) is a useful approximation to 
the correct expectation value. The inverse Hessian in the 
signal Hilbert space around this map, 



Ss Ss^ 



(19) 



characterizes the uncertainty. For the free theory, this is 
the propagator, as given by Eq. [17] 

The identity of fully field theoretical and classical re- 
sults holds only for the case of a free theory. However, the 
latter is often an acceptable approximation to the former, 
while much easier to derive. Therefore, we will make also 
use of the classical approximation in the following. 



which is built upon Zp[J], the partition function of the 
theory for given parameters p. The information field es- 
timators, marginalized for the unknown parameters, is 
then simply given by 



s\d) 



S \ogZ[J,K] 



Z 



SJ 



dpP{p) 



J,K=0 

SZp[J] 



dppip)^ 

P{p\d) 



6J 

(s)(s|d,p) 



(22) 



The aim of this work is to provide schemes to calcu- 
late this parameter marginalized signal mean. It is not 
just the signal estimator multiplied by the parameter 
prior P{p), but is additionally weighted by a parameter 
likelihood factor P{d\p) — Zp, so that the parameter- 
dependent signal means are averaged over the parameter 
posterior P{p\d). Therefore, parameter values which are 
especially compatible with the data get automatically a 
larger weight, as recognized before. 



D. Parameter uncertainty and posterior 

In many applications, there are parameters specifying 
the likelihood and prior, and thereby the coefficients of 
the Hamiltonian, which are not precisely known. These 
parameters, in the following denoted by the abstract vec- 
tor p, are either to be determined from the data, to be 
marginalized over, or to be simultaneously determined 
with the signal. 

In such a case we have to construct the joint posterior 
of the signal and the parameter given the data. This is 
given according to Bayes' theorem as 



Pis,p\d) 



P{d,s,p) 
P{d) 



P{s\d,p) 



P{d\p) 
P{d) 



p{p),m 



where we had to introduce the parameter prior P{p). 
The last expression contains a Bayes factor P{d\p) / P{d), 
the ratio of the evidence of data for a specific parameter 
set to that of the model at all. Thus, the joint poste- 
rior is weighted towards model-parameters for which the 
data provide larger evidence in addition to any prior- 
weighting. 

The definition of the Hamiltonian for fixed-parameters 
as Hp[s\ = —\ogP{d,s\p) permits us to construct the 
joint partition fmiction 



Z[J,K] 



dpjvsP{s,p\d)P{d)e^^'+^^P (21) 

P(d,s|p)=e""p[=l 



E. Effective marginalized Hamiltonian 

If a parameter-dependent Hamiltonian Hp[s\ = 
— log P{d,s\p) describes the conditional probability of 
the signal and data given the parameters, an effective, 
parameter-marginalized Hamiltonian H[s] is defined by 



= jdpP{d,s,p) = JdpP{d,s\p)P{p) 

dpe-""^-"^-^" , (23) 



with Ep = — log P{p) the parameter-prior-energy. It is 
crucial, that i/j)[s] obeys the correct normalization con- 
dition, jDd fDs exp(— _ffp[s]) = 1, otherwise a hidden 
prior on p may enter the calculation. 

In many cases, an analytical calculation of the effective 
Hamiltonian will be out of reach. Since the perturbative 
field theoretical treatment requires a polynomial repre- 
sentation anyway, it is often easier to obtain the coeffi- 
cients of the effective Hamiltonian separately by Taylor- 
Frechet expansion around a reference field configuration 
t, so that s ^ t + (f>. The Hamiltonian for </> is then 



ZplJ] 



Ho 

jx 



OO 

n— 3 

H[t] = -log Jdp 
SH[s 



with 



— Hr, \t]—Er> 



SHp[s] 



SSr, 



(24) 



X I [p\d,s=t) 



6 



SSx SSy 



1 



6Hp[s] SHpjs] 

SSx 6Sy 

5" H\s] 



+ jxjy, and 

(p\d,s=t) 



^■77p '5«(^vr(l))---^s(a;,r(n)) 



Here, (. . ■){p\d,s) = fdp... P{p\d, s) provides expecta- 
tion values with respect to the parameter p given the 
data d and the signal s. Repeated coordinate indices 
are thought to be integrated over. The interaction co- 
efhcients A.^xl\,x,^ are symmetrized by averaging over all 
possible permutations tt from the space of permutations 
v. In general, D~y needs to be symmetrized, too, but 
we have left out the symmetrization in the above equa- 
tion for convenience, since in the cases we consider D~}, 
is already symmetric. 

In case the expansion was around t = 0, then 



Ho 

3 

A(3) 



= -log jdpe~"" 



- Up) 



p/{p\d,s=0)j 



= {Dp^ - jpjl){p\d,s=o) and 



(25) 



jpjp 



(A(,=^) + 3£)p ^ ip - 3p3p3p)(p\d,s=o) 
-3D-^^j+jjj 
= (A(,'*)+4A(,3)®jp-3£)-ig,£)-i+6L»-i« 

-3p3pjp.ip){p\d^s=o) - 4 A^^^ i + 3£)-^ £>" 
-6D~^ i^jj^ +jjjj, ... 



Here, an implicit tensor notation was used, 
with e.g. {jjj)xyz = jxjyjz and we defined 
the symmetrized tensor product {A j)xiX2X3 = 
It E,re-p^('^'r(i),a;7r(2))j'(2;,r(3))- For higher rank ten- 
sors, the symmetrized tensor product is defined in an 
analogous way. 



III. SIGNAL SPECTRUM UNCERTAINTY 
A. Spectrum parameterization 

Our example application of IFT with parameter uncer- 
tainties is the reconstruction of a Gaussian signal with 
unknown variance, which we introduce now. 

The signal covariance {Sp)xy = (sxSy)(s|p) may exhibit 
any dependence on the spatial coordinates as long as the 
matrix is symmetric and positive definite. In the cosmo- 
logical relevant case of translationally and rotationally 
invariant signal statistics, the signal covariance is fully 
characterized by its power spectrum. This means, there 
is an orthonormal basis O of the signal Hilbert space 
which diagonalizes Spi 



with Ikq the identity in the transformed basis, Ps (k) 
the power-spectrum, and using Einstein sum conven- 
tion. In case we are dealing with a signal over a d- 
dimensional Cartesian space, Okx = exp{ikx) is simply 
a Fourier transformation and the Fourier space identity 
is Ikq = (2 7r)''(5(fc — q), provided the scalar product in 
Fourier space is adopted as a^b = {2tt)~'^ J dka{k) b{k). 
However, since the theory should also be applicable in 
curved spaces like the sphere, or even in spaces withoiit 
translational invariance, we formulate it in an abstract 
way and just assume that the basis O diagonalizes the 
signal covariance, which is always possible. 

In general, the signal covariance Sp may also exhibit 
any dependence on the unknown parameter p of the prob- 
lem, as the power spectrum in cosmology is a complicated 
function of the cosmological parameters. However, in or- 
der not to depend on a specific model, we model the 
power spectrum as being a linear combination of a num- 
ber of positive basis functions fi{k) with disjunct sup- 
ports (the spectral bands) with respect to the basis Okx, 
so that 



PsAk) = J2piMk) 



(27) 



is positive for all k (all coefficients ofp= {pi)i are positive 
and the spectral bands cover the full fc-space domain). 
We define 



{Si}xy — Okx fi{k) Oky 

and therefore have 



(28) 



(29) 



Since wc also need the inverse of the covariance matrix 
we further define 



.m 



(30) 

(31) 
(32) 



and the pseudo-inverse of the band- variances, 
= Okx9i{k)Oky, 

so that 

^p ^ = ^' 

i 

is the inverse of Sp, as one can easily veriiy. 

B. Spectral prior and joint Hamiltonian 



For definiteness, we assume that the individual signal- 
band amplitudes Pi have independent prior distributions. 



[O SpO'^^kq — Okx{Sp)xyOqy 



IkqPsAk), (26) 



P{p) = l[P{Pi), 



(33) 



7 



with the individual priors being given by inverse Gamma 
distributions, which are power-laws with exponential low- 
amplitude cutoff at qi : 



1 

qi r(ai 



1) 



exp 



(34) 



For tti 3> 1 this is an informative prior, where qi/ai 
determines the preferred value. A non-informative prior 
would be given by Jeffreys prior with = 1 and qi ~ Q} 
The joint Hamiltonian is therefore 



with the parameter prior energy 



(35) 



E{P) = E 



Oil log 



+ \og{q, r(a, - 1)) 



(36) 



Generic filter formula 



In the following, we derive five approximate filters for 
this problem. It will turn out that they can all be cast 
into a single set of determining equations, with different 
coefficients. This generic filter formula should be pre- 
sented first, before we discuss the individual approaches. 

All of the derived filters can be expressed as Wiener 
filters for some specific spectrum Sp* = X^i-Pi'^ii with 
different spectral parameters p* . The signal map and 
the spectrum assumed for its construction have to be 
calculated self-consistently from 



ip. = Dp. j, and 



(37) 



for example by simply iterating these two equations. 

Here, the filter-specific parameters are e^, 5i, and 7^ — 
Ui — 1 + where Qi = Tr[S'~^S'i] is the number of 

degrees of freedom of the ith spectral band. In order to 
simplify notation, we drop in the following the * from p* , 
assuming that the context makes it clear wether we talk 
about the unknown parameter p or a parameter choice 
p* for a specific filter. 

In order to develop a filter for our signal, we have to de- 
cide according to which principle the signal or the power 



joint MAP MAP spectrum 




^ 5 



-e/(2e+4) 



PURE 



Figure 1: Parameter 5i and Ei of the five different filters for 
Jeffreys prior in the representation of the generic filter formula 
Eq. |37| The parameter of the displayed filter are derived in the 
following sections: the critical filter in Sec. |IIID| t he classical 
fiher in [InGl the joi nt MA P filter in Sec. |mE[ the MAP 
spectrum fiher in Sec.|IITF] and the PURE filter in Sec. [VEl 
The critical line between filter with and without perception 
threshold as given by Eq. [64] is also shown. 



spectrum used in the Wiener filtering is determined. In 
the space of all possibilities for the signal and its power 
spectrum the joined probability function P{s,p\d) has to 
be asked. There are different hyperplanes in this space 
along which this function can be cut, marginalized, and 
maximized. The ultimate answer of the PURE approach 
will come from marginalizing p and calculating the signal 
mean. However, first we want to establish more tradi- 
tional signal estimators, using largely the MAP principle 
along different cuts through the joint signal and spectral 
parameter space. 

In case a Jeffreys prior is adopted {qi — and ai = 1) 
it will turn out that the trivial filter m[d) — would 
be the preferred solution in all cases. However, since 
Jeffreys prior is an improper prior which is convenient to 
represents the class of very broad, but proper priors, we 
should not hesitate to remove the trivial filter solution by 
hand. Otherwise we would need to enter the discussion 
about an appropriate informative prior, which we like to 
avoid for simplicity. This can not be decided generically, 
but only for any concrete inference problems individually. 

The parameters of the filters described in Sec. |IB| and 
derived in the next few subsections are summarized in 
Fig.[T] 



^ Since this would result in an improperly normalized prior, we 
understand this as ai = 1 + qi = e, and limE_j.o at the end of 
the calculation. We note, that this limit might not exist, or that 
it provides trivial results. I.e. we will find in Sec. |V A| that in 
this limit the signal reconstructed with the full field theory turns 
out to be zero and the data is assumed to be purely made of 
noise. Thus the improper Jeffreys prior is actually inappropriate 
for the full problem, although interesting. 



D. The critical filter 

Our first filter can be understood without any reference 
to statistical inference and is along the lines of the well 
known Karhunen-Loeve (KL, [13j ESI [3T] ) and Feldman- 
Kaiser-Peacock (FKP, |S]) estimators for power spectra. 
The Wiener filter map rUp = Dpj (with Dp = (S'~^ -|- 
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N~^R)~^ and j = K''N~^d) of a data realization of a 
Gaussian random signal with a known covariance Sp will 
have on average the covariance 



P ^p) (d,s|p) 



(38) 



as one can verify with a short calculation.^ The propa- 
gator on the rhs just accounts for the power lost in mea- 
surement and filtering. Now we assume that our data 
and our Wiener filter map are so rich or typical that this 
equation also holds for our individual data realization. 
Thus we drop the expectation angles, apply Tr[x S^^'^ 
and get the critical filter recipe in the form of Eq. 
with parameters Si ~ 1, Si — 0, ai — 1, and qi = 0. The 
last two parameters are characteristic for Jeffreys prior, 
which we obviously have assumed implicitely, since no 
prior information on the spectrum, or even its magni- 
tude on a logarithmic scale, has entered the critical filter 
scheme. 



i 



The name critical filter should become clear in Sec. VI 
There, we show that at least in cases where the different 
spectral parameters are independent of each other, the 
different filters can be cast into two classes, such with 
and such without perception threshold. The critical filter 
marks the demarcation line between these phases. 

The critical filter has recently been applied success- 
fully by P7 to reconstruct an all sky map of the galactic 
Faraday depth from sparse and noisy measurements. 
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Figure 2: The effective signal Hamiltonian, Eq. |40| without 
the normalization constant _ffo in case of Jeffreys prior and 
for a single, independent signal s = Si and data point d — di. 



The parameters are Rij 



Sij and Sij 



PiSij. The 



different curves show the Hamiltonian for representative data 
values. The triangle symbols mark the results of the inverse 
response estimator mir = R~^d on the corresponding curves. 
The large open and small filled circles mark the renorma- 
lized and classical map estimator results, respectively. The 
existence of a classical perception threshold can be seen: for 
— 2 < d < 2, the classical map is exact zero since no non- 
trivial stationary point of the Hamiltonian exists. The thin 
dotted line shows the renormalized Hamiltonian for the case 
d = 3, as provided by |(s — rnp)^ D'^ {s — m^). 



E. Joint MAP filter 



Extremizing the joint Hamiltonian, Eq. |35[ with re- 
spect to p and s yields the joint MAP filter parameters 
{5i,ei) = (0,1). We note, that if we extremize with re- 
spect to the log-spectral amplitudes = logpi, the pa- 
rameters {Si, Si) — (0,0) would have resulted due to the 
effect of the Jacobian of the prior transformation. This 
latter filter is identical to the classical one derived below 
in Sec. HITGI 



Dp = {S-^+M)-\ Qp^SpM, and 

^0 = ^log|7V| + irft7Vd + ^log(g,r(a,-l)). 

i 



F. MAP spectrum filter 



Marginalizing the joint Hamiltonian Eq. [35] over the 
signal space provides the spectrum Hamiltonian 

H{p) = -log(P(d,p)) = -log(P(d|p)F(p)) 

= hog\i + Qp\-^j^ Dp J + h;, 



^ Using the abbreviation M = N ^Rwe write (mp mj){d^s|p) = 
Dp{jj^)(d,s\p)Dp = DpR'fN-^RSpR + N)N-^RDp = 
Dp{MSpM + M)Dp = DpM{l + SpM) (1 + SpM)-'^Sp = 
DpMSp = Sp(l + MSp)-^{l + MSp - 1) = Sp-Dp. 



Here we used Eq.[T5]for P{d\p). A data-space view on this 
likelihood is given in Appendix|A] Extremizing H(j)) with 
respect to pi and sorting for terms linear in it provides 
the MAP-spectrum parameter {Si,ei) = (1, 1). 

If we extremize with respect to the parameters = 
log Pi, we get {SijEi) = (1,0), the parameters of the crit- 
ical filter. Thus, the critical filter can be regarded as 
the one resulting from a MAP spectrum estimation on 
a logarithmic scale. Note that MAP estimators are sen- 
sitive to the coordinate system in which parameters are 
expressed. 
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G. Classical map estimator 

The eff ecti ve, parameter marginalized signal Hamilto- 
nian (Eq. 23 1 can be calculated analytically:^ 



H[s] = ^sHIs-j^S + J2l^'^OS 



9. + ^. 



Hn = 



Hq, with 

^d^ N-^d+ hog {\2 7tN\) 



log n 



r(7.)<?. 



r(a,-l)|2^5, 



(40) 



The classical mapping equation results from extrem- 



izing this Hamiltonian and is provided by Eq. 37 for 
{Si,ei) = (0,0). This can be regarded as a poor man's 
critical filter, since only the power in the map is used to 
determine the signal covariance, and no correction for the 
power lost in the filtering is applied. In case of a single 
independent data and signal point, the effective Hamil- 
tonian is an one dimensional function in signal space and 
is shown in Fig. [2] 



IV. UNCERTAINTY RENORMALIZATION 
FLOW 

A. General remarks 

Although the MAP methods often provide acceptable 
signal estimators, they are not optimal in an >C^-error 
norm sense. In case of a skewed posterior, such recon- 
structions are suboptimal. Our goal is to calculate mo- 
ments of the signal field averaged over the effective poste- 
rior, as e.g. (s)(s|rf) given by Eq. 22 since those optimize 
the £^-error. For this we might construct the effective 



Hamiltonian exactly or in terms of a Taylor expansion as 
in Eqs. [24] and [25] 

Such an expansion of the effective Hamiltonian around 
a reference field is expected to work best when the para- 
meter prior is well localized around a specific value. The 
effective Hamiltonian will then be close to the original, 
parameter-dependent one for this parameter value. In 
case the original theory was free, the effective theory will 
have only small interaction terms. Diagrammatic expan- 
sions can then be conducted and truncated at low order. 

Unfortunately, in many practical applications, the un- 
certainties of the parameters are substantial, and not de- 
scribed by a well localized prior. In this case it might be 
possible to construct the effective Hamiltonian by repeat- 
edly adding smaller portions of parameter-uncertainty, 
with each uncertainty dose so small that the resulting 



Hamiltonian has only weak interactions, which can be 
re-absorbed into renormalized, effective propagator and 
data source terms. The accumulated uncertainty can 
thereby become large and equal to the required amount 
of entropy for the unknown parameter of the theory. In 
the following we will explain the basics of this uncertainty 
renormalization flow. 



B. Parameter uncertainty renormalization 

A broad prior for a parameter p may be decom- 
posed into a number N of narrow and mutually in- 
dependent priors for some auxiliary variables Tj (with 
je{l,...,N}): 



m-iU /dr,P(r,)|5(p-f]r,). 



(41) 



We have chosen here the parameter to be the sum of 
the auxiliary variables for definiteness and simplicity, but 
other relations can be worked out in a similar way or be 
mapped onto this case. Also the mutual independence of 
the auxiliary variable is mostly a technical convenience 
and not a strict requirement. Note, that we have included 
a starting parameter value of Tq into the sum. Since it 
would be convenient to identify this with the prior expec- 
tation value (p)(p) throughout the full renormalization 
procedure we require 



(42) 



with pq — ip) . We further introduce the Z-th parameter 
residual as r; = tq 4 
and Tat = To = _po- 
be expressed as 



- E1/+i Tj > so that ro = ^^^^ Tj = p 
The effective Hamiltonian can now 



-H\s] 



dpP{p) e 



-HJs] 



ldp[\{ ldT,P{r,)\5{p~j2T,)e-"^^- 

\j=^ j 3=0 



,(1), 



l[Jdr,P{r,)\ ldT2Pir2)e "'^ 



,(2), 



^ The term \Si\ has to be read as the determinant within the non- 
zero subspace of Si. 



dTN P{tn) e ^N+po 



«=^oW, (43) 
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where H^^\s] = H[s]. This means that a series of effec- 
tive Hamiltonians with increasing accumulated param- 
eter uncertainty is defined, and an uncertainty adding 
operator: 



log / dT„+i /'(t„+i) e 



-H 



Note that H, 



(0) 



(44) 

iro — Hp^ro &nd -ffrw^ = This Uncer- 
tainty renormalization can be done using Eq. [24] if it is 
not possible to do it analytically. To each Hamiltonian a 
time-like variable t can be assigned, which measures the 
amount of uncertainty accumulated so far. A suitable 
variable is the accumulated uncertainty dispersion, 



in 



E 

i=i 



>(r. 



(45) 



where we used Eq. 42 



In case all auxiliary variables have 
the same prior, we find tn = nti — ^ {{p — po)^)(p)- 

At each time-step a renormalization of the Hamiltonian 
can be done, in which it is cast back into the structure 
it had before, e.g. in our example of reconstruction with 
unknown power spectrum the free Hamiltonian of Eq. 



10 just with modified coefficients (propagator, source 
and interaction terms). 

In our example the recast Hamiltonian is free, which 
implies that we are constructing a Gaussian approxima- 
tion of the parameter marginalized signal posterior to be 
used for inference. It is shown in [5] that the chosen 
Gaussian seems to be optimal in an information theoret- 
ical and thermodynamical sense. It maximizes the cross 
information with the correct effective posterior. 

A renormalization flow can further be established by 
letting the individual time-steps of size ti become in- 
finitesimally small, however, their number N infinitely 
large, while keeping the total added uncertainty constant, 
t — Nil. The result are the renormalization flow equa- 
tions for the coefficients of the Hamiltonian. The actual 
form of these equations depends on the Hamiltonian and 
is much simpler if the Hamiltonian has less interactions. 



Therefore, even a free Hamiltonian as in Eq. 10 should 



be further simplified by suppressing the linear term jj^s as 
far as possible. This is done for the value of p = Po, which 
is our starting point in parameter space, by changing to a 
new field variable = s— mo with mo = -Do JO: Dq = D 
and jo = jp„ . The Hamiltonian reads now 



Pa J 



HM = 



s - mo, 



iJo,p, with 



3p - Dp^ma^ jp - Dp^ Dq jo, and (46) 



H'o^p 



Ho.p 



mlDp ^mo - jlmo 



and is especially simple for p ^ po, since then Jq — 0. 
Now, the effective Hamiltonian is calculated and ex- 
panded according to the recipe in Sec. |II E| for a param- 
eter prior well localized on p = po- The localization of 



the prior is typically characterized by a small parameter 
5t = a^, which also appears as a pre-factor of the various 
coefficients of the effective Hamiltonian. 



C. 4' order interactions 

In order to perform the renormalization step, the re- 
casting of the uncertainty marginalized Hamiltonian in 
Eq. [44] into its original form, let us be a bit more spe- 
cific about the effective Hamiltonian for definiteness. By 
virtue of our foresight on the calculations in Sec. |V A| we 
assume that up to linear order in 5t the effective Hamil- 
tonian is given by 

H'[4,] = + A^U +l4>^ {Do' + A(2)) (47) 

+ ^ A(3) 0, + ^ A(') [</>, 0, 0, 0] + 0{6t'), 

with_A(i), A(2)^ ^(3)^ g^^^ ^(4) i_,gij^g of order 0{6t). Here, 
Eq. 



24 or 25 might have been used. 



The corrections can be expected to be small of 0{6t), 
since our originally free Hamiltonian, Eq. |46[ should be 
recovered in the limit of vanishing parameter uncertainty, 
dt 0. All higher order interaction terms are of higher 
order in St and therefore ignored in the following. For 
our later convenience we introduce 



A(") 



lim 



A(") 



(48) 



Now, we can renormalize by absorbing all diagrams 
of order 0{St) into renormalized propagator and source 
terms, in order to obtain a free Hamiltonian. Since j' in 
Eq.|46]is already of order St and only the three- and four- 
leg vertices have contributions of order 0{St), only uncer- 
tainty loop corrections have to be taken into account. We 
can therefore define the renormalized data-source vertex 
of the effective ^-theory, 



0[St^) 
_A(i)_iA(3) [.,!?] 

-A(i) - i a(3) D 



(49) 



which takes the dominant uncertainty-loop correction 
into account. We dropped the subscript at Do and use 
the Feynman rules provided in [S]- The renormalized 
propagator up to linear order in St is 



D# = + . + +0{St') 

= D - DA^^^ D -^DA'-^^[-,D,-]D 

1 



(50) 



(2) 



J-^xx' ^^x'zz'v' 2:2' J-^y'y 
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The inverse renormalized propagator up to first order is 



1 = Z?-i+A(2) + iAW[.,i?,.], 



(51) 



These coefficients now define a renormalized effective 



-ff#Oi which be- 



Hamiltonian, H'^[(j)\ — \4>^ 
longs to a fi'ee theory, and is similar to in that it 

has the same mean and uncertainty dispersion by con- 
struction. Higher order uncertainty correlations differ 
certainly, due to the approximation of the renormaliza- 
tion step. In contrast to the original Hamiltonian Hp^^ [s\ 
in Eq. |10[ which was also free, the renormalized Hamilto- 
nian has some amount of parameter uncertainty correc- 
tions included. 

Now, the original field s = mo + (p can be restored, 
leading to a free Hamiltonian with 



Dt+st = and 
it+st = j# + Du^mt, 



(52) 



where the subscript t + 5t indicates that the parameter 
uncertainty is increased by 5t from its original value of 
t. Since 5t can be made arbitrarily small, a system of 
differential equations can be derived. 



dPt 
dt 



d{h 
dt 



lim 



D 



t+st 



St 



-DtX^^^ Dt-^DtX^^^[-,Dt,-]Dt, and 



lim 



Jt+st - 3t 



(53) 



St-fQ 6t 



which form the uncertainty renormalization flow equa- 
tions. The pseudo-time t measures the accumulated dis- 
persion of the resulting prior probability. These equa- 
tions can be transformed into the more compact form 



dD: 



dt 

drrit 

~dt~ 



A(2) + iAW[.,A,-], and 



(54) 



where rrit = Dt jt ■ 

The renormalization equations so far are evolution 
equations for operators. If they should become ordinary 
partial differential equations, e.g. in our case in terms 
of spectral parameters, some sort of closure is required. 
This should ensure that the renormalized Hamiltonian 
gets its original structure, so that it is clear which terms 
are affected by the parameter uncertainty adding oper- 
ation. Ideally, the change in the Hamiltonian can be 
mapped onto changes of effective parameter values. 

After the repeated adding of small amounts of param- 
eter uncertainty, the resulting effective parameter prior 



distribution can be expected to be a Gaussian, due to the 
central limit theorem of statistics, 



(55) 



V. SIGNAL RECONSTRUCTION WITH PURE 
A. Lognormal spectral prior 

Now we want to apply the PURE scheme to our exam- 
ple problem from Sect |III| of how to reconstruct a Gaus- 
sian signal with unknown covariance. 

First, we have to express our spectral prior in a way 
that we can apply the PURE method developed in the 
previous section. For this we need some additive auxil- 
iary random variables into which we can decompose our 
(unknown) spectral amplitudes. These variables should 
each have an unbiased distribution with zero mean ac- 
cording to Eq. |42] For the moment, we concentrate on a 
single spectral parameter pi and change to the parameter 
variable = log pi, which can be split up into additive 
auxiliary variables: — '^jTij. For convenience we as- 



34 



sume pij = e"^'^ to be distributed according to Eq, 
with properly chosen parameters aij , and qij , as detailec . 
in the Appendix 15] There, it is shown that 



Q{n,ti) 



(56) 



for the limit of an infinite number of auxiliary param- 
eters, with a finite total uncertainty dispersion of ti — 
{'^i)(Ti) ~ {Ti)1ri)^ expected from the central limit the- 
orem of statistics. The resulting statistics for pi = e"^' 
is therefore log-normal. If we take the limit — >■ oo we 
obtain Jeffreys prior, which is flat on a logarithmic scale, 
and which conveniently permits us to compare the PURE 
filter to the others. 



B. Uncertainty renormalization 

In the following we assume that all spectral coefficients 
receive uncertainty with the same infinitesimal rate, so 
that the prior distributions in Eq. [34] are all the same 
and narrowly centered on pi = 1, which implies 5ti = 
l/{a, - I) ^ 5t and qi = - 3/2 = 5t-^ - 1/2 (see 
Appendix [b]). 

Expanding the Hamiltonian in Eq.|40] around the refer- 
ence map m = D j recovers the original free Hamiltonian, 
shifted to (f) = s — m, and perturbed by some additional 
interaction terms A^") = 5< A^"' + OiSt^) with 

= + P'^^^Sr^^) S-^p-^m, 

i 

^^'^ = T.li^^ + ^-pi'^'sr'm)s-'p-' 

i 

c-1 tc-1 -2 

- 6- mm'Aj , 
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A(3) = A(4)[-,-,-,m], (57) 
A('*) = -3 ^ Pi^ ® S-\ and 

i 

A^") = 0forn>4. 

Here we have reinserted pi in order to have variables 
which capture the evolution of the renormalization flow 
dynamics. The renormalization flow equations are given 
by inserting the latter terms into two independent equa- 



tions out of Eqs. 53 - 55 



dt 
dj 



1 



((1 + - Tr[B,]) S: 



dt 



— ^E]Pi ^m, with 

i 

= {mm) +D)S^^ and m = Dj. 



(58) 



This system of integro-differential equation represents 
the most accurate form of the PURE filter for this ap- 
plication. It is, however, in general quite expensive to 
implement numerically, since it requires to follow the evo- 
lution of matrices. 



longer time-scales for gi ^ 1. Actually, M evolves only 
in directions orthogonal to all Si, since 



dt 



(P, M) = 0, 



(61) 



meaning that the power within the spectral bands of M 
gets only reshuffled, but is conserved. This implies that 
the evolution of M interferes very little with the spectral 
evolution, since all changes to M happen in directions 
which are projected out for Sp. The reverse is not true, 
since M couples to the value of p. For an accurate recon- 
struction the evolution of AI needs to be followed, since it 
determines D and thereby m = Dj. However, we focus 
now only on the signal spectrum evolution and ignore the 
slow and perpendicular M evolution. 

The evolution equation for p and j have to be solved 
simultaneously as a function of t up to the spectral uncer- 
tainty tmax = ((logp— logpo)^)(p) of the original problem. 
This version of the PURE filter for spectrally uncertain 
Gaussian random signals with a lognormal spectral prior 
is projected onto our spectral parametrization, but not 
yet onto our generic filter formula. 



C. Projection onto spectral parameterization 

To simplify the PURE filter equations, we want to re- 
cast the system into the original from, which assumes 
£1-1 = (5'-i + M) with S-^ = J2,p~^ S-\ Thus evo- 
lution equations for the piS are needed. Since -^D^^ = 

^i^r^ liP7^ + S-^'^ contains the parameter evolution 
one has to specify how to split the evolution equation of 
the inverse propagator. 

A natural way is to require all terms of the rhs of 
Eq. 



58 



which are parallel to the inverse signal covari- 
ance bands, to contribute to their evolution, and the ones 
which are orthogonal, to contribute to the evolution of 
M. The part of a matrix A parallel to Si is obtained by 
the projector 



1 



r^ A =-Tr [AS,] S~^ 



Qi 



(59) 



and the orthogonal part by (1 — Vi) A. Splitting the 
evolution equation this way yields 



^ = ftpj, or ^=/3j, and 

dM 
~dF 



(60) 



E P^^^^^ f-Tr m ~ B,) , with 



ft 



2 + jT'-[^^]^^ ' — 



With this, the fastest evolution is assigned to the signal 
strength, whereas the inverse noise term evolves on much 



D. Jeffreys prior 

Let us see if there is a stationary asymptotic for the 
limit of infinite spectral uncertainty. The resulting filter 
for t ^ oo (which implies a Jeffreys prior) seems to be 
trivial, since j — >■ and therefore mp — >■ in this limit. 

This can actually be understood intuitively. On the 
logarithmic scale Ti — log pi Jeffreys prior becomes flat 
in Ti. Thus an arbitrary negative (and therefore in- 
finitesimally small pi) is as probable a priori as an arbi- 
trary large (and therefore basically infinite large pi). 
However, the likelihood P{d\p) = JVs P{d, s\p) discrim- 
inates clearly between those cases. 

For p we expect s 0, which means that the 
data must be purely noise, which has a low, but finite 
likelihood. This likelihood does not decrease significantly 
if T — )• — oo and p and s become exactly zero, since the 
amount of noise stays constant. It has to be identical to 
the data in this case. 

However, for pi — > +oo, while the data stays finite, 
either the more and more unlikely case of a low signal 
realization for an increasing variance must have happend, 
or the more and more unlikely case of a noise canceling 
the large amplitude signal must have happend. 

Thus, the a priori as probable case -f oo is heav- 
ily penalized by the likelihood with respect to the case 
Ti — ^ — OO. Since the PURE filter aims to estimate the 
mean signal averaged over all Ti, this imbalance of the 
likelihood factor lets the regime t,; — oo dominate this 
average leading to {s)(^s\d) = 0- 
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E. Projection onto generic filter formula 

We can artificially remove the trivial solution of the 
PURE filter in case of Jeffreys prior by imposing dj /dt — 
instead of Eq. 58 This should be understood as looking 
for a stationary point of the p-evolution alone. Thus, we 
are asking for the unique spectrum, which taken as a 
sharp prior would remain unchanged if a small amount 
of spectral uncertainty is added. This fix point is given 
by /3i = and therefore 



included into this classification scheme, by just adding 
Ui — 1 to Ei. Filters with > obviously do not ex- 
hibit a perception threshold, since even in the limit of 
vanishing data and vanishing propagator Eq. [37] has the 
positive solution pi — qi/iji + £»)• 

The point {6i,ei) — (1, 0) lies on top of the critical line, 
as can be seen in Fig. [T] and therefore the term critical 
filter seems to be appropriate for it. 



B. Translation invariant data model 



P. = — ^Tr[i?- 
+ 1 



(62) 



Although we have derived this filter only for Jeffreys 
prior, it is quite plausible to assume that the general 
spectrum formula, Eq. [37j with {6i,ei) — (1,-0.5/(1 + 
2/gi)) also holds for {ai,qi) ^ (1,0). We leave a formal 
proof of this for future work. In this form the PURE 
filter for a Jeffreys prior is projected into the Je-plane of 
the representation Eq. [37] for the MAP filters, which is 
displayed in Fig. [T] 



VI. PERCEPTION THRESHOLD 
A. Critical perception 

In case of Jeffreys prior {qi — 0, = 1, and t = oo), 
the spectral coefficients pi used by some of our filters 
are only non-zero for spectral bands with a data vari- 
ance above some threshold. Bands with lower band 
power are fully suppressed in the reconstructed map, 
since the Wiener filter removes completely any fluctu- 
ations in bands for which the assumed signal covariance 
is zero. Thus, a perception threshold appears for filters 
within a certain critical line in the (5e-plane, which we 
calculate in the following. 

Filter without perception threshold have to exhibit 
Pi > 0, even when the data has no power at all. Thus we 
investigate the extreme case d = by inserting rrip = 
into Eq. |37] and find after some algebra 



1 



2£, 
Qi 



1 



Tr((l + Qp)-i/0, 



(63) 



<i 



with li — S~^Si the unit matrix restricted to the i-th 
band. Since the marked expression on the rhs is one only 
for vanishing p, we find the critical line to be given by 



(5f '* = 1 



2£, 



(64) 



Filters with 6i > (5"'' do not exhibit a perception thresh- 
old, since even for d = all pi > 0. Filters with Si < S^"^ 
exhibit a perception threshold. We note that a non- 
Jeffreys prior with a; > 1 but still qi = can also be 



Here, we calculate the perception thresholds of our 
filters in the case of a translationally invariant data 
model. Although a general criterion for the position of 
the threshold in data space can easily be worked out, it 
is more instructive to investigate a simplified case. We 
assume the signal and noise to live in the same spatial 
space, and their covariances to be fully characterized by 
power spectra in Fourier space. 



S{k,q) = i27TrS{k-q)Ps{k), 
N{k,q) - (27r)"(5(fc-g)PAr(fc), 



(65) 



with Ps{k) = {\s{k)\-')/V, and P^ik) = {\n{k)\^)/V, 
where V is the observed volume. We define spec- 
tral bands with band spectra Ps^ik), so that Ps{k) — 
J2iPi Psi{k)- We assume further that the signal process- 
ing can be completely described by a convolution with 
an instrumental beam, 



d(x) = / dy R[x - y) s{y) +n{x), 



(66) 



where the response-convolution kernel has a Fourier 
power spectrum PR{k) = |i?(fc)p (no factor l/V). 

In this case D can be fully described by a power spec- 
trum. 



D{k,q) = {27Tr6{k-q)PD{k), 



(67) 



with Poik) = (Ps^ik) + Pnik) Pj^^k))-'^ and ah spec- 
tral bands decouple. 



C. Approximative treatment 



The generic filter equations, Eqs. 37 now separate into 
independent equations for the individual pi. Let us look 
first at the trace-terms in this equation, which now read 

r tQ-i] T/ f dk Pd{k)p^PQ^{k) 



(68) 



We define the data power Pd{k) = \d{k)\'^ /V and the 
i-band fidelity power Pq^ik) = {Ps^ Pr/ PN){k). We 
further use the approximation V dk / {2 tt)"^ f (k) w 
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Qi f{ki), which assumes that /(fc), a combination of spec- 
tra, does not vary significantly over the narrow spectral 
band i. This permits us to write the generic filter for- 



mula, Eq. 37 which determines the filter band coeffi- 
cients Pi as an algebraic and dimensionless expression: 



i + y 

y2 



(69) 



Here, we have dropped the index i and defined the noise- 
normalized data power x = P^{ki) / P^^ki) and the mea- 
surement fidelity y — pi PQ.{ki). The numerical coeffi- 
cients are 

2 2 

t = — {-fi + Si) = 1 -\ (ai-l+Ei), 

Qi Qi 

2 

u = — qi PQ^{ki), and S = Si. (70) 

Qi 

In case of Jeffreys prior, these simplify to u = and 
t — 1 + 2 Ei/ Qi and the recast generic filter formula Eq. 
|69]has the following solutions 

y = 
y = 

Xo = 

Although there might be up to three simultaneous real 
solutions for a given x, always the largest value should 
be taken. This is in line with our decision to ignore 
the trivial solution and the expectation that the assumed 
spectral amplitude y should increase with increasing data 
power x, an not decrease as the lower branch of the square 
root does. The largest solution is non-zero only if 




X > Xth 







Xo < 1, 



Xo+2 y^t {t - 5) Xo > 1. 



(72) 



The assumed dimensionless signal power y is shown in 
Fig. [3] as a function of the dimensionless data variance x. 



Asymptotically, for a; 3> xq, we have a linear increase of 
assumed signal strength and data variance y{x) = x — xq. 
The critical filter is special in that this relation holds 
exactly for the full region x > xth = xq. All of the 
MAP estimators in this work have a;th > 2:0 and exhibit 
a jump from y — to y — -y/l — S/t at x = Xth, followed 
by an approach to the linear asymptotic. The threshold 
approaches Xth — ^ 1 from above for Qi ^ 00 for the MAP 
spectrum filter, however, it is always x^h — 4 for the 
classical filter, independent of the spectral bin size gi. 

The PURE filter as given in Sec. |V E| is the only one of 
our sample, which has no perception threshold since y{x) 
is positive for all x. Even in case the data exhibits negli- 
gible variance x <^ 1, the filter still uses a non- negligible 
spectral amplitude, since y(0) ~ l/((?i + !)• This might 
surprise, since the implied assumption of a significant 
signal variance is obviously not supported by the data. 
However, the renormalized filter aims for an optimal re- 
construction, and not for an accurate power spectrum 
measurement, and letting some fraction of some data 
band with apparently low noise realization pass (remem- 
ber a; <C 1) does not spoil this. 

The combination of signal measurement and filter- 
ing can be regarded as a single response operator i?', 
with R' s = (to)(£;|s) = FpRs = D M s, which decom- 
poses into separate pass-through factors for the individ- 
ual bands, R'^ = Poih) PR{h) P^\h) = y/il+y). This 
is also shown in Fig. [3] 



D. Consequences for cosmological practice 

The critical filter estimates the power spectrum of a 
Wiener map, which is (iteratively) filtered with this very 
same spectrum (until convergence), while correcting the 
spectra for an estimate of the filtered-out power dur- 
ing each iteration. Similar procedures are widely used 
in cosmology under the names Karhunen-Loeve (KL, 
[H mini]) and Feldman-Kaiser-Peacock (FKP, iSJ) es- 
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timators to measure power spectra of galaxy cataloges. 
As the critical filter, these should therefore also exhibit 
a perception threshold for spectral modes with a data 
variance not significantly exceeding the noise variance. 
Therefore, one would expect that cosmological spectra 
obtained by these estimator should exhibit modes with 
zero power. However, in applications of these scheme 
in the cosmological literature, the iterations of filtering 
and spectral measurements are usually not repeated un- 
til convergence.^ Thus the knowledge system keeps some 
memory of the initial power spectrum choice, which can 
be regarded as a hidden prior regularizing the spectrum 
and preventing the perception threshold that a correctly 
implemented KL or FKP estimator would exhibit (see 
also pij for a discussion on this). 



positive ones, and therefore our bands are split into iden- 
tical positive and negative parts, except the zero-band, 
which is continuous. 

The signal reconstructions of the five filters are also 
shown in Fig. |4j and the used spectra in Fig. [5| The 
spectra are roughly ordered the way we expect them to 
be following our perception threshold analysis in Sec. |VI| 
However, there is the suprising modification that even the 
renormalized filter seems to suffer from a slight percep- 
tion threshold, since many of the higher fc-vector bands 
with lower signal to noise ratio are nearly free of power. A 
more informative prior for the power distribution would 
cure this, but this would limit the generality of our fil- 
ter. So we should look for other yet unexploited prior 
information. 



VII. COMPARISON OF THE MAP MAKING 
ALGORITHMS 

A. The test case 

We want to examine the filter performances with an 
instructive test case, in case the spectral uncertainty 
is small, all filters in this work can be expected to pro- 
vide comparable results since they Wiener filter the data 
with basically the prior spectrum with small differences. 
Thus, in order to see the differences in performance more 
clearly, we again adopt Jeffreys prior for our spectral pa- 
rameters (tti = 1 and qi = 0, well, for numerical reasons 
Qi = 0.01). A spectrum, which naturally implements 
this distribution is the famous l//-spectrum, which has 
equal power per decade in frequency space. To have a 
finite zero mode and signal variance, we adopt 

Ps{k)^Po{l + {k/kof)-K (73) 

with Pq — 5 and fco = 2. We further assume some white 
noise with PAr(fc) = = 0.1. 

In case the response would be constant or a convolu- 
tion, the spectral inference problems would be separable 
in Fourier space, as we have shown in the last section. 
In order to have a more complex problem, with cou- 
pling between the different unknown spectral parameter, 
we introduce a non-homogeneous observational signal re- 
sponse R over the 257 pixel of our signal space, as dis- 
played in Fig. |4]together with a test data set. We spht the 
Fourier space in 64 disjunct spectral bands, with pi — 4 
for all but the lowest band, which has po = 5, since it also 
contains the zero mode. Since we are dealing with a real- 
number signal in a discrete space, we have to take care 
of the negative frequency spectrum being identical to the 



* Some random examples: Tegmark et al. |31| . Percival et al. |25| 
as well as Feldman et al. [8] use a fixed and constant spectrum 
in the optimal data weighting step of the KL and FKP schemes, 
and do not iterate at all. 



B. Spectral smoothness regularization 

The 1// signal spectrum adopted in our example is 
a member of the large class of smooth spectra, which 
do not exhibit spectral lines, jumps and edges. Spec- 
tral smoothness information can easily be incorporated 
into the framework. Since we do not want to specify a 
specific smoothness length scale, we require the double 
logarithmic derivative of the spectrum to be of limited 
variance. This can be done by introducing an additional 
prior energy for non-smoothness 

Here we have (re-)introduced the logarithm of the power 
spectrum parameters Ti = logpi, have discretized the 
integral and derivatives, and collected all coefficients in 
a matrix T . The quadratic form in r in the last line 
shows that this is actually a log-normal prior contribu- 
tion, which can be combined with the log-normal prior 
appearing in the renormalization calculation. Instead of 
repeating that calculation with now interdependent pa- 
rameters, we just use our physical intuition to obtain the 
regularized filter equation for the filter spectrum, and 
leave any proof or improvement for future work. 

The unregularized evolution equation for r, Eq.[60j can 
just be equipped with a regularizing force —dEi-cg/dr: 

^=/?(r)-rr. (75) 

The regularized Jeffreys prior case is then given by the 
fix point specified by /3(r) = Tt and reached asymptot- 
ically for t oo. The matrix T couples the neighboring 
bands together and thereby produces much smoother fil- 
ter spectra without the gaps the other filter spectra ex- 
hibit, as can be seen in Fig.[5j where the regularized filter 
spectrum for ap = 2 is shown. 
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0.2 0.4 0.6 0.8 1 



Figure 4: First panel: Test data points according to d — Rs + n and signal response 7? s in a settign with periodic boundary 
conditions. Second panel: Signal realization s and the reconstructions as labled in the fourth panel. The four MAP recon- 
structions (joined MAP, MAP spectrum, classical, and critical filter) are shown with the same line since they are very similar. 
Also the reconstruction using the exact spectrum is displayed. Third panel: The same as above, just enlarged and with the 
signal subtracted to highlight the difference in the reconstruction errors. Fourth panel: Response R and line key for the panels 
above. Fifth panel: Error variance {{s^ — mx)^)(^^^s) of the filters from 700 signal and data realizations in logarithmic units to 
show the average fidelity of the individual filters. The order of the line keys reflects roughly the order of the average error of 
the different methods. The color/grey-scale areas (in online/printed version) should only help to guide the eye. 
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C. Full PURE filter 

Spectral smoothness can not always be assumed, and 
therefore we should also think of other ways to improve 
the filter fidelity. One way is to be more precise in the 
PURE filter derivation. The largest approximation made 
was probably the neglection of the dj /dt term, which for 
infinite spectral uncertainty, i — > oo, leads to a trivial 
solution of m = 0. If we want to include this term, we can 
therefore only apply it for a finite amount of uncertainty, 
say up to < = 1. This implies that the initial starting 
point of the spectral renormalization flow would influence 
our result. In case of a concrete application, this might 
be very desirable, since there a good initial guess for the 
spectrum might be available. 

In our more abstract discussion here, we want to avoid 
such choices, also in order to be sure not to have included 
too much spectral prior knowledge into the filter prevent- 
ing a fair comparison to the others. Therefore we start 
the renormalisation flow including the dj/dt term with 
the fix point spectrum of the approximated PURE fil- 
ter (without this term) and stop it at i = 1. This way 
we have both, independence of any prior spectrum and 
inclusion of non- Wiener corrections. The resulting filter 
seems to be partly cured from too generous predictions 
in regions without data while the results in better deter- 
mined regions are practically unchanged, as can be seen 
in Fig. |4| 

This can be understood in the following way. We have 
roughly dj/dt oc —S~^m, since S^^m « Qi for most 
modes. If there is power at a poorly observed location 
in the map m on a level comparable to the well observed 
ones, J evolves in both regions with similar speed. How- 
ever, the effect of this evolution to the map m = Dj 
is larger in regions with larger uncertainties, since D is 
larger there. Thus, any power spilled into observational 
gaps is removed faster than power in well observed re- 
gions. The full PURE filter seems to be aware of the 
lower certainty of the former. 



D. Statistical comparison 

A statistical assessment of the different filters is also 
shown in Fig. [1] There it is apparent that the filters 
derived from MAP principles are worse than the PURE 
filter, with only the critical filter being comparable in 
performance. The underestimation of the power spectra 
due to the perception threshold obviously reduces the 
fidelity of those filters. 

The spectral smoothness regularized, renormalized fil- 
ter clearly outperforms the unregularized ones, proba- 
bly due to the lack of spectral gaps. Its performance is 
comparable to that of the Wiener filter using the correct 
signal power spectrum Ps{k). The error variance for the 
latter filter is also displayed in Fig.|4]in comparison to its 
theoretical value given by the Wiener variance D^^ (see 
Eq. fT7|). Finally, also the full PURE filter as described 



in the last section is shown. Its fidelity is comparable 
to the spectrally regularized one, without that any spec- 
tral smoothness assumptions had to be made. Of course, 
such assumptions could also be included into this filter. 



VIII. CONCLUSIONS 

We showed how to deal with parameter uncertainties 
in information field theory by introducing an effective 
Hamiltonian over the joint space of the signal field and 
the parameters. In order to go beyond a classical, or 
Maximum a Posteriori treatment of the problem we pre- 
sented an uncertainty renormalization scheme, in which 
the parameter uncertainty is successively fed into the 
knowledge system. The resulting parameter uncertainty 
renormalized estimation, PURE, can be used to tackle 
many signal inference problems including calibration un- 
certainties. 

It seems that the PURE provides a Gaussian approx- 
imation to the full posterior probability function, which 
has maximal cross- information with it, as thermody- 
namic considerations in [5] have shown. 

To demonstrate the advantage of PURE with a con- 
crete example, we investigated the general problem of 
inferring a Gaussian signal with unknown spectrum from 
noisy data, which follows from a linear, but inhomoge- 
neous data model. Following the parameter uncertainty 
renormalization and various classical approaches, four 
classical and one renormalized filter were derived. All 
filters can be regarded as Wiener filter operations with 
assumed signal spectra to be calculated from the data by 
a single recipe, Eq. [37| with just differences in two of its 
numerical coefficients. 

The computational complexity of all those filters is 
therefore very similar and should not be a reason to pre- 
fer one over the other. Their signal fidelity, however, 
differs significantly. In case a non-informative Jeffreys 
prior is adopted for the spectral amplitudes, all classical 
filters suffer from a perception threshold. Spectral bands, 
which do not show more data power than the threshold, 
are completely filtered out. Three out of the four classi- 
cal filters investigated have a perception threshold which 
requires data variance significantly above the noise level. 
The fourth one, the critical filter, lives on the critical line 
between filters with and without perception threshold in 
our space of filter parameters. The critical filter tries to 
match the correct spectrum on a logarithmic power scale. 
Its perception starts therefore for modes with a variance 
just above the noise level, as soon the data indicates some 
potential signal power. It has recently been applied suc- 
cessfully to the reconstruction of an all sky map of the 
galactic Faraday depth [24^ . 

The critical filter coresponds in general to the 
Karhunen-Loeve method [131 and for an infi- 

nite window function to the FKP method [8] frequently 
used in cosmology to estimate power spectra of galaxy 
catalogs. It seems that the perception threshold of this 
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0.03 0.1 0.3 1 

Figure 5: Signal and noise spectra in comparison to the assumed spectra of tiie five filter for the datasets displayed in Fig. |4] 
in double logarithmic units, fc- vectors in units of the Nyquist wavevector of /cNy ~ 256 vr. The filter-spectra for the individual 
dataset of Fig. |4] are shown in the top panel, the average filter spectra for the 700 signal and data realizations also shown Fig.|4] 
are displayed in the bottom panel. The presence of perception thresholds in many of the presented filters is clearly visisible by 
the many missing frequencies in the top panel and also as the general down-trend of the average spectra close to the crossing 
of signal and noise spectra. The order of the line keys refiects the order of the average spectral amplitudes of the different filter 
at fe = 0.3. 
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method is often 'cured' in applications by a truncation 
of the full iterative scheme. This implies the presence of 
a hidden spectrum prior in such estimates. 

The PURE filter precepts also for spectral bands, 
which by chance exhibit less power than expected for 
the noise alone. This might appear as being too gener- 
ous - the signal spectrum adopted by this filter is typi- 
cally larger than the correct and therefore optimal, but 
unknown one. However, the PURE filter exhibits the 
largest fidelity of our filter sample, even slightly better 
than that of the critical filter. The reason lies in the 
asymmetric fidelity loss for under- and overestimating 
the true signal spectrum. Spectrum underestimation is 
much worse than overestimation in terms of signal recon- 
struction accuracy. The renormalized filter knows about 
this and adds a safety margin to any spectral band. This 
margin is inversely proportional to the number of data 
degrees of freedom informing about the signal spectrum 
in this band. Thus, in the limit of a large number of data 
points determining the band spectrum the renormalized 
filter approaches the critical one, but always from the 
perception threshold free side. 

Although the classical filter resulted from maximizing 
the exact effective, parameter marginalized Hamiltonian 
(Eq. 40 1 , it performs much worse than the critical and 



PURE filters. Thus, this is an example where the MAP 
principle, or equivalently a tree-level IFT calculation, 
provides a poorly performing algorithm, and uncertainty 
loop corrections as explicitely included in the PURE filter 
or even the critical filter are essential. 

The PURE filter, as well as the others, can be fur- 
ther improved by adding any additional spectral infor- 
mation. One way is to use informative priors on the spec- 
tral behavior, which instantaneously cure the perception 
threshold problem. However, even in case no information 
on the location of the spectrum is available, information 
about its smoothness as a function of the Fourier space 
coordinate may be exploited. We show that the perfor- 
mance of the PURE filter with spectral smoothness prior 
approaches that of the optimal Wiener filter for known 
signal power spectrum. 

Since the computational complexity of the renormal- 
ized filter is identical to the critical one already used in 
cosmology, there exists no reason not to use it for Wiener 
filtering of signals with unknown spectra. One only has 
to keep in mind that the internally used spectrum of the 
filter is not the best estimate of the signal spectrum, but 
an overestimate. The critical spectrum provides such an 
estimate, using the posterior maximum for the logarithm 
of the spectral amplitudes. 

The full PURE filter, which contains non- Wiener filter 
corrections and requires the more expensive evaluation of 
the renormalization flow equation, performs best among 
all spectrally unregularized filters. Spectral smoothness 
information can also be incorporated into it if available. 

To conclude, the PURE scheme to construct optimized 
filters presented in this work is very general and should 
also be applicable to the problems of inference with un- 



certainties in the instrument response, the typical cali- 
bration problem, and for measurements without known 
noise level. A better understanding of the implications 
and assumptions of the commonly used process of self- 
calibration should be feasible, and possibly also improve- 
ments thereof. The pseudo-time parameter appearing in 
the renormalization flow, the amount of uncertainty or 
parameter dispersion fed into the knowledge system, may 
be connected under certain circumstances to real phys- 
ical time. For measurement devices with drifting cali- 
bration or noise parameters, and also for signals with a 
slow, but unknown time evolution of their signal spectra, 
the parameter uncertainty renormalization equation of- 
fers a natural possibility to model this. Once the amount 
of uncertainty dispersion per physical time is fixed, the 
equation permits to continuously update the unknown 
parameters by combining past and novel information in 
an optimal, and controlled way. The PURE approach 
may thereby make contributions to the technologically 
important field of optimal control and time dependent 
instrument calibration. 
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Appendix A: Signal covariance likelihood 

In order to find the posterior of the signal covariance, 
we have to calculate P{p) Zp/Z. We show below that the 
evidence Zp [0] for any parameter p of the free Hamilto- 
nian, given by Eq. |15[ is 



Zp = P{d\p) = g{d,RSpR'' +N). 



(Al) 



This formula can also intuitively be read as the data 
likelihood given p, since it compares the power in the 
data to their expected fluctuations level (rf(i^)(d,s|p) = 
RSp + N. It can therefore be used for a Bayesian 
estimate of any model-parameter of the free theory, not 
only for spectral parameters as in this work. 

Proofs for Eq. Al can be found in [HlIIS]. However, 



these proofs rely on either on the very special assumption 
fo R being invertible [TS] or on a Taylor expansion of the 
logarithm of a marix [5], which has actually a limited 
convergence radius and therefore is not sufficient for a 
general proof. A proof without such limitations goes as 
follows: First, we concentrate only on the dependence of 
P{d\p) on the data d, 



Zp = P{d\p) = 



VsP{d,s\p) 
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cx J Vsexp(-^{s^S-h + {d-Rs)^N-^ {d-Rs)) 



oc 



exp (^-^d\RSpR^ +N)-^d 
gid,RSpR^ + N). 



Here, we used Dp = {Sp^ + M)-^ with M = R^N^^R. 
The second last step relied on RSpR^ + N being the 
inverse of N-^ - N-^RDpWN-^: 

{RSpR^ +N){N-^ -N-^RDpR^N-^) = 
R {Sp - Sp M Dp - Dp)R'^N-^ + 1 = 
R{Sp~ Sp{D-^ - Sp^)Dp- Dp)R^N-^ + 1 = 1. 

Second, we have to show that Zp has the same normal- 



ization the Gaussian in Eq. |Al| has. This is most easily 
seen by 

jvdZp ^ Jvd JvsP{d,s\p) 

= jvsP{s\p) jvdP{d\s) 

^ jvsg{s,Sp) Jvng{n,N) = 1, 

where in the last line we replaced the data space integra- 
tion variable cZ by a linear shift with the noise variable 
n = d — Rs and used the fact that Gaussians are nor- 
malized to unity. Thus, Eq. |Al|is proven. 



Appendix B: Derivation of the Gaussian prior 



Here we show how the different auxiliary variables r, 
combine into a normal distribution for = 



Ti 7 , as 



was assumed in Sec. V A We drop in the following the 
index z, which labels the signal bands. Since we assume 
e'^J to be distributed according to Eq. [34) we have 



exp [-{a - 1) (tj ~ log g) - g e 
r[a - 1] 



(Bl) 



The non-bias condition, Eq. 42 translates into 



= log? ^ 'V^o(a - 1) = 0, 



(B2) 



with ijjniz) being the Polygamma function. This con- 
dition fixes q{a) = Q^o{a-i) ^ which for large values of 
a, and thereby for well localized auxiliary parameters, is 
asymptotically g = a — | . The dispersion of the auxiliary 
variables is 



(B3) 



which asymptotically is 5t — 1/ {a — 1) for large a. 

Now, we can work out the total prior resulting from 
the combination of = t/St auxiliary variables, where 
t is the uncertainty level of the prior, and St that of the 
individual variables: 

^(^'=(n 'n^-T.^i) 

dTj g** ^ exp [—{6t^^ — ik)Tj — q e^^j] i 



dk 

•dk fr[6t-^ -ik 
•dk 



N 



T[5t' 



N 



2ti 



exp 



ikr + N 



V[5t-^ - ik] 



dk 



exp 



27r 

g(r,i)for 5t~^Q 



-ikNijj^{a - 1) 
ikr- ^k^ + OiStk) fc2 



(B4) 



as also expected from the central limit theorem of statis- 
tics. Thus, the resulting distribution for the pi parameter 
is log-normal. 
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